Search CORE

924 research outputs found

Stacked Generalizations in Imbalanced Fraud Data Sets using Resampling Methods

Author: Bastian Nathaniel D.
Kerwin Kathleen
Publication venue
Publication date: 03/04/2020
Field of study

This study uses stacked generalization, which is a two-step process of combining machine learning methods, called meta or super learners, for improving the performance of algorithms in step one (by minimizing the error rate of each individual algorithm to reduce its bias in the learning set) and then in step two inputting the results into the meta learner with its stacked blended output (demonstrating improved performance with the weakest algorithms learning better). The method is essentially an enhanced cross-validation strategy. Although the process uses great computational resources, the resulting performance metrics on resampled fraud data show that increased system cost can be justified. A fundamental key to fraud data is that it is inherently not systematic and, as of yet, the optimal resampling methodology has not been identified. Building a test harness that accounts for all permutations of algorithm sample set pairs demonstrates that the complex, intrinsic data structures are all thoroughly tested. Using a comparative analysis on fraud data that applies stacked generalizations provides useful insight needed to find the optimal mathematical formula to be used for imbalanced fraud data sets.Comment: 19 pages, 3 figures, 8 table

arXiv.org e-Print Archive

USMA Digital Commons (United States Military Academy, West Point)

Cyber Creative Generative Adversarial Network for Novel Malicious Packets

Author: Bastian Nathaniel D.
Pavlik John
Publication venue: USMA Digital Commons
Publication date: 13/06/2023
Field of study

Machine learning (ML) requires both quantity and variety of examples in order to learn generalizable patterns. In cybersecurity, labeling network packets is a tedious and difficult task. This leads to insufficient labeled datasets of network packets for training ML-based Network Intrusion Detection Systems (NIDS) to detect malicious intrusions. Furthermore, benign network traffic and malicious cyber attacks are always evolving and changing, meaning that the existing datasets quickly become obsolete. We investigate generative ML modeling for network packet synthetic data generation/augmentation to improve NIDS detection of novel, but similar, cyber attacks by generating well-labeled synthetic network traffic. We develop a Cyber Creative Generative Adversarial Network (CCGAN), inspired by previous generative modeling to create new art styles from existing art images, trained on existing NIDS datasets in order to generate new synthetic network packets. The goal is to create network packet payloads that appear malicious but from different distributions than the original cyber attack classes. We use these new synthetic malicious payloads to augment the training of a ML-based NIDS to evaluate whether it is better at correctly identifying whole classes of real malicious packet payloads that were held-out during classifier training. Results show that data augmentation from CCGAN can increase a NIDS baseline accuracy on a novel malicious class from 79% to 97% with a minimal degradation in accuracy on benign classes (98.9% to 98.7%)

USMA Digital Commons (United States Military Academy, West Point)

Military and Security Applications: Cybersecurity (Encyclopedia of Optimization, Third Edition)

Author: Bastian Nathaniel D.
Dinmore Matthew
Publication venue: USMA Digital Commons
Publication date: 01/03/2023
Field of study

The domain of cybersecurity is growing as part of broader military and security applications, and the capabilities and processes in this realm have qualities and characteristics that warrant using solution methods in mathematical optimization. Problems of interest may involve continuous or discrete variables, a convex or non-convex decision space, differing levels of uncertainty, and constrained or unconstrained frameworks. Cyberattacks, for example, can be modeled using hierarchical threat structures and may involve decision strategies from both an organization or individual and the adversary. Network traffic flow, intrusion detection and prevention systems, interconnected human-machine interfaces, and automated systems – these all require higher levels of complexity in mathematical optimization modeling and analysis. Attributes such as cyber resiliency, network adaptability, security capability, and information technology flexibility – these require the measurement of multiple characteristics, many of which may involve both quantitative and qualitative interpretations. And for nearly every organization that is invested in some cybersecurity practice, decisions must be made that involve the competing objectives of cost, risk, and performance. As such, mathematical optimization has been widely used and accepted to model important and complex decision problems, providing analytical evidence for helping drive decision outcomes in cybersecurity applications. In the paragraphs that follow, this chapter highlights some of the recent mathematical optimization research in the body of knowledge applied to the cybersecurity space. The subsequent literature discussed fits within a broader cybersecurity domain taxonomy considering the categories of analyze, collect and operate, investigate, operate and maintain, oversee and govern, protect and defend, and securely provision. Further, the paragraphs are structured around generalized mathematical optimization categories to provide a lens to summarize the existing literature, including uncertainty (stochastic programming, robust optimization, etc.), discrete (integer programming, multiobjective, etc.), continuous-unconstrained (nonlinear least squares, etc.), continuous-constrained (global optimization, etc.), and continuous-constrained (nonlinear programming, network optimization, linear programming, etc.). At the conclusion of this chapter, research implications and extensions are offered to the reader that desires to pursue further mathematical optimization research for cybersecurity within a broader military and security applications context

USMA Digital Commons (United States Military Academy, West Point)

Intelligent Feature Engineering for Cybersecurity

Author: Alhajjar Elie
Bastian Nathaniel D.
Maxwell Paul
Publication venue: USMA Digital Commons
Publication date: 09/12/2019
Field of study

Feature engineering and selection is a critical step in the implementation of any machine learning system. In application areas such as intrusion detection for cybersecurity, this task is made more complicated by the diverse data types and ranges presented in both raw data packets and derived data fields. Additionally, the time and context specific nature of the data requires domain expertise to properly engineer the features while minimizing any potential information loss. Many previous efforts in this area naively apply techniques for feature engineering that are successful in image recognition applications. In this work, we use network packet dataflows from the Defense Research and Engineering Network (DREN) and the Engineer Research and Development Center\u27s (ERDC) high performance computing systems to experimentally analyze various methods of feature engineering. The results of this research provide insight on the suitability of the features for machine learning based cybersecurity applications

Crossref

USMA Digital Commons (United States Military Academy, West Point)

Deep VULMAN: A Deep Reinforcement Learning-Enabled Cyber Vulnerability Management Framework

Author: Bastian Nathaniel D.
Hore Soumyadeep
Shah Ankit
Publication venue
Publication date: 03/08/2022
Field of study

Cyber vulnerability management is a critical function of a cybersecurity operations center (CSOC) that helps protect organizations against cyber-attacks on their computer and network systems. Adversaries hold an asymmetric advantage over the CSOC, as the number of deficiencies in these systems is increasing at a significantly higher rate compared to the expansion rate of the security teams to mitigate them in a resource-constrained environment. The current approaches are deterministic and one-time decision-making methods, which do not consider future uncertainties when prioritizing and selecting vulnerabilities for mitigation. These approaches are also constrained by the sub-optimal distribution of resources, providing no flexibility to adjust their response to fluctuations in vulnerability arrivals. We propose a novel framework, Deep VULMAN, consisting of a deep reinforcement learning agent and an integer programming method to fill this gap in the cyber vulnerability management process. Our sequential decision-making framework, first, determines the near-optimal amount of resources to be allocated for mitigation under uncertainty for a given system state and then determines the optimal set of prioritized vulnerability instances for mitigation. Our proposed framework outperforms the current methods in prioritizing the selection of important organization-specific vulnerabilities, on both simulated and real-world vulnerability data, observed over a one-year period.Comment: 12 pages, 3 figure

arXiv.org e-Print Archive

USMA Digital Commons (United States Military Academy, West Point)

Adversarial Machine Learning in Network Intrusion Detection Systems

Author: Alhajjar Elie
Bastian Nathaniel D.
Maxwell Paul
Publication venue
Publication date: 23/04/2020
Field of study

Adversarial examples are inputs to a machine learning system intentionally crafted by an attacker to fool the model into producing an incorrect output. These examples have achieved a great deal of success in several domains such as image recognition, speech recognition and spam detection. In this paper, we study the nature of the adversarial problem in Network Intrusion Detection Systems (NIDS). We focus on the attack perspective, which includes techniques to generate adversarial examples capable of evading a variety of machine learning models. More specifically, we explore the use of evolutionary computation (particle swarm optimization and genetic algorithm) and deep learning (generative adversarial networks) as tools for adversarial example generation. To assess the performance of these algorithms in evading a NIDS, we apply them to two publicly available data sets, namely the NSL-KDD and UNSW-NB15, and we contrast them to a baseline perturbation method: Monte Carlo simulation. The results show that our adversarial example generation techniques cause high misclassification rates in eleven different machine learning models, along with a voting classifier. Our work highlights the vulnerability of machine learning based NIDS in the face of adversarial perturbation.Comment: 25 pages, 6 figures, 4 table

arXiv.org e-Print Archive

USMA Digital Commons (United States Military Academy, West Point)

Data-Efficient, Federated Learning for Raw Network Traffic Detection

Author: Bastian Nathaniel D.
Bierbrauer David A
Willeke Mikal
Publication venue: USMA Digital Commons
Publication date: 12/06/2023
Field of study

Traditional machine learning (ML) models used for enterprise network intrusion detection systems (NIDS) typically rely on vast amounts of centralized data with expertly engineered features. Previous work, however, has shown the feasibility of using deep learning (DL) to detect malicious activity on raw network traffic payloads rather than engineered features at the edge, which is necessary for tactical military environments. In the future Internet of Battlefield Things (IoBT), the military will find itself in multiple environments with disconnected networks spread across the battlefield. These resource-constrained, data-limited networks require distributed and collaborative ML/DL models for inference that are continually trained both locally, using data from each separate tactical edge network, and then globally in order to learn and detect malicious activity represented across the multiple networks in a collaborative fashion. Federated Learning (FL), a collaborative paradigm which updates and distributes a global model through local model weight aggregation, provides a solution to train ML/DL models in NIDS utilizing learning from multiple edge devices from the disparate networks without the sharing of raw data. We develop and experiment with a data-efficient, FL framework for IoBT settings for intrusion detection using only raw network traffic in restricted, resource-limited environments. Our results indicate that regardless of the DL model architecture used on edge devices, the Federated Averaging FL algorithm achieved over 93% accuracy in model performance in detecting malicious payloads after only five episodes of FL training

USMA Digital Commons (United States Military Academy, West Point)

Constrained Optimization Based Adversarial Example Generation for Transfer Attacks in Network Intrusion Detection Systems

Author: Bastian Nathaniel D.
Chale Marc
Cox Bruce
Weir Jeffery
Publication venue: USMA Digital Commons
Publication date: 23/05/2023
Field of study

Deep learning has enabled network intrusion detection rates as high as 99.9% for malicious network packets without requiring feature engineering. Adversarial machine learning methods have been used to evade classifiers in the computer vision domain; however, existing methods do not translate well into the constrained cyber domain as they tend to produce non-functional network packets. This research views the payload of network packets as code with many functional units. A meta-heuristic based generative model is developed to maximize classification loss of packet payloads with respect to a surrogate model by repeatedly substituting units of code with functionally equivalent counterparts. The perturbed packets are then transferred and tested against three test network intrusion detection system classifiers with various evasion rates that depend on the classifier and malicious packet type. If the test classifier is of the same architecture as the surrogate model, near-optimal adversarial examples penetrate the test model for 69% of packets whereas the raw examples succeeds for only 5% of packets. This confirms hypotheses that NIDS classifiers are vulnerable to adversarial attacks, motivating research in robust learning for cyber

USMA Digital Commons (United States Military Academy, West Point)